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Well-established rules of translational initiation have been used as a cornerstone in molecular biology to under- 
stand gene expression and to frame fundamental questions on what proteins a cell synthesizes, how proteins 
work and to predict the consequences of mutations. For a group of neurological diseases caused by the abnor- 
mal expansion of short segments of DNA (e.g. CAG»CTG repeats), mutations within or outside of predicted 
coding and non-coding regions are thought to cause disease by protein gain- or loss-of-function or RNA 
gain-of-f unction mechanisms. In contrast to these predictions, the recent discovery of repeat-associated non- 
ATG (RAN) translation showed expansion mutations can express homopolymeric expansion proteins in all 
three reading frames without an AUG start codon. This unanticipated, non-canonical type of protein translation 
is length-and hairpin-dependent, takes place without f rameshifting or RNA editing and occurs across a variety of 
repeat motifs. To date, RAN proteins have been reported in spinocerebellar ataxia type 8 (SCA8), myotonic dys- 
trophy type 1 (DM1), fragile X tremor ataxia syndrome (FXTAS) and C90RF72 amyotrophic lateral sclerosis/fron- 
totemporal dementia (ALS/FTD). In this article, we review what is currently known about RAN translation and 
recent progress toward understanding its contribution to disease. 



INTRODUCTION 

Repeat-expansion disorders are a class of neurological and 
neuromuscular diseases caused by the expansion of short repeti- 
tive elements within the human genome. The genie location of 
the expansion has been traditionally used to classify these disor- 
ders into coding expansions caused by protein gain-of-function 
effects, and non-coding expansions caused by either a 
loss-of-function of the affected gene or RNA gain-of-function 
effects (1 -3). For protein gain-of-function diseases, the expan- 
sion mutation is translated as part of a larger open-reading frame 
(ORF), resulting in the expression of a mutant protein that dis- 
rupts cellular function and induces toxicity. For example, in 
Huntington's disease (HD) the CAG expansion mutation is 
translated as part of the huntingtin protein, which results in 
protein aggregation and cellular dysfunction (4). For RNA 
gain-of-function disorders, non-coding expansion RNAs accu- 
mulate as nuclear foci that sequester RNA-binding proteins 
and lead to a loss of their normal function (5,6). For example, 



in myotonic dystrophy type 1 (DM1) and type 2 (DM2), CUG 
or CCUG expansion RNAs sequester MBNL proteins from 
their normal splicing targets, such that the resulting MBNL 
loss-of-function leads to alternative splicing dysregulation (7- 
10). The recent discovery of repeat-associated non-ATG 
(RAN) translation (11) showed that microsatellite expansions 
do not follow the canonical rules of translation initiation and 
can generate a series of unexpected repeat proteins. This 
finding opens the door to new paradigms in disease mechanisms 
and cell biology. In this review, we discuss the discovery of RAN 
translation, what is currently known about its molecular biology 
and progress toward understanding its contribution to disease. 

INITIAL DISCOVERY OF RAN TRANSLATION 
IN SCA8 

RAN translation was initially discovered by Zu etal. (11) while 
investigating the molecular mechanisms of spinocerebellar 
ataxia type 8 (SCA8). SCA8 is a dominantly inherited, slowly 
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Figure 1. RAN translation in spinocerebellar ataxia type 8 (SCA8). (A) Prior to the discovery of RAN translation, bidirectional transcription at the SCA8 locus was 
known to produce RNA foci from the CUG expansion transcript and a polyglutamine expansion protein from the CAG expansion transcript (13,14). The CAG ex- 
pansion transcript has an unusual short ORF with an ATG initiation codon immediately upstream of the CAG expansion and a series of stop codons immediately 
after the repeat (14). Both RNA and protein gain-of-function effects are evident in SCA8. (B) To separate the effects of the CUG EX p transcript from the polyGln 
protein, the ATG immediate upstream of the CAG expanded repeat was mutated in an ATXN8 minigene (11). Unexpectedly, this mutation did not prevent the ex- 
pression of the polyglutamine protein and was the first indication of RAN translation. (C) Schematic diagram showing CAG-repeat expansion expressing both 
ATG-initiated polyGln and non-ATG initiated polyGln, polyAla, polySer RAN proteins repeats in all the three reading frames. 



progressive neurodegenerative disorder caused by a CTG»CAG 
repeat expansion (12). Both RNA and protein disease mechanisms 
likely operate in SCA8 as bidirectional transcription produces 
both a CUG expansion transcript that forms RNA foci (13) and 
a CAG expansion transcript with an unusual ATG-initiated 
ORF encoding a nearly pure polyGln expansion protein 
(Fig. 1A) (14). The first evidence for RAN translation came 
when Zu et al., trying to separate the RNA and protein gain-of- 
function effects, found that removing the only ATG initiation 
codon within an SCA8 minigene did not prevent expression of 
the polyGln protein (Fig. IB) (11). Subsequent experiments 
with epitope-tagged minigenes showed CAG expansions 
lacking an ATG initiation codon produce distinct homopolymeric 
protein products in all the three reading frames, polyGln, polyAla 
and polySer (Fig. 1C). Because these findings were novel, and 
completely unexpected multiple approaches were used to charac- 
terize the transcripts and to establish the identity of these proteins. 

Analysis of polyribosome-bound transcripts showed no evi- 
dence of RNA editing that could have introduced a start codon 
(1 1). Immunoprecipitation and analysis of C- and N-terminal 
epitope-tagged constructs demonstrated that RAN translation 
does not require frameshifting, and that RAN occurs in all the 
reading frames even in the presence of an ATG-initiated ORF. 
Additionally, a combination of epitope tags, tritium labeling 
and mass spectrometry unequivocally proved that these proteins 
contain expanded polyAla, polyGln or polySer repeat tracts. For 
polyalanine, mass spectrometry identified a series of N-terminal 
peptides containing varying numbers of alanines. No peptides 
containing N-terminal methionine were identified. These data 
suggest that translation in the polyAla reading frame begins 
with an alanine, and that start sites occur at various positions 
throughout the length of the repeat tract. 

Additional experiments on RAN translation (11) demonstrated 
a number of features. First, immunofluorescence showed RAN 
proteins expressed from all the three reading frames can accumu- 
late in a single cell, although more frequently only one or two 
RAN proteins were found. Second, RAN proteins expressed 
across CAG expansions increase apoptosis, suggesting a potential 
contribution to disease. Third, RAN proteins are expressed across 
hairpin-forming CAG but not across non-hairpin-forming CAA 



repeats in cell culture. These data suggest that structured RNAs 
may be required for RAN translation. Fourth, RAN translation 
also occurs across CUG expansion transcripts. Fifth, longer 
CAG repeat tracts are associated with the simultaneous expression 
of multiple protein products with a different length threshold 
required for translation in each frame (Table 1). Taken together, 
these data demonstrated that CAG and CUG expansion transcripts 
undergo a novel type of protein translation in which homopoly- 
meric proteins are expressed in all the three reading frames 
without an ATG-initiation codon. 



IN VIVO EVIDENCE FOR RAN TRANSLATION 
IN SCA8 

After establishing that RAN translation occurs in transfected cells, 
Zu et al. looked for evidence that SCA8 RAN proteins are 
expressed in vivo (ll).SCA8is characterized by severe cerebellar 
atrophy with Purkinje cell degeneration and loss of granule cells 
(14). Zu et al. (11) developed antibodies against the unique 
C-terminal region of the predicted CAG-encoded polyAla RAN 
protein and showed poly Ala-positive immuno staining in cerebel- 
lar Purkinje cells from human SCA8 but not control autopsy 
tissue. These antibodies also detected polyAla RAN proteins in 
Purkinje cells from an established mouse model of SCA8 (14). 
SCA8 Purkinje cells are also known to accumulate CUG RNA 
foci (13) and poly-Gin inclusions (14). Although additional 
studies are needed to understand the effects of RAN proteins in 
SCA8, the accumulation of the SCA8 polyAla protein in Purkinje 
cells suggests that RAN proteins may contribute to disease. 

RAN TRANSLATION IN MYOTONIC DYSTROPHY 
TYPE 1 

Additional in vivo evidence for RAN translation was demon- 
strated in myotonic dystrophy (11). DM 1 , one of the best exam- 
ples of an RNA gain-of-function disease (5), is caused by a CTG 
expansion in the 3' UTR of the DMPK gene (15-17). Antisense 
transcripts in the CAG direction have also been reported (11,18). 
To determine whether RAN translation also occurs for DM1, 
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Table 1. In vitro characteristics and in vivo detection of RAN translation 



Disorder 


Repeat 


RAN protein 
(in vitro) 


1 nresnold 
(in vitro) 


Tissue 
(in vivo) 


Refs. 


SCA8 


CAG 


polyGln 


>42 repeats 


ATG-polyGln, cerebellum and brain stem 
(14) 


(11) 






polyAla 


>73 repeats 


Cerebellum 


(11) 






polySer 


>58 repeats 


ND 


(11) 


DM1 


CAG a 


polyGln 


ND 


Myoblasts, skeletal muscle, peripheral blood 
leukocytes 


(11) 


FXTAS 


CGG 


polyGly 


>30 repeats 


Frontal cortex, cerebellum, hippocampus 


(24) 






polyAla 


>88 repeats 


ND 


(24) 






polyArg 


UD 


ND 


(24) 


ALS/FTD 


GGGGCC 


polyGlyPro 


> 145 repeats 


Cerebellum, hippocampus, iPSC-derived 
neurons, neocortex, medial and lateral 
geniculate nuclei, testes 


(25-27) 






polyGly Ala 


>38 repeats 


Cerebellum, hippocampus 


(26) 






polyGlyArg 


UD 


Cerebellum, hippocampus 


(26) 



ND, not examined and not determined; UD, examined but undetermined. 
a Antisense transcript. 
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Figure 2. Model of RAN translation across repeats in coding and non-coding gene regions. Schematic diagram showing mutations located in intronic or exonic regions 
with expression of distinct RAN proteins in three frames from sense and antisense directions. For expansions in introns, sense and antisense transcripts (not shown) 
produce RAN proteins with different repeat motifs and distinct C-terminal regions not corresponding to any endogenous proteins. For repeat-expansion mutations 
located in ORFs, up to six distinct RAN proteins may be produced from sense and antisense transcripts (see upper inset for antisense RAN proteins). The RAN 
protein expressed in the ORF is predicted to start at or close to the repeat and contain the same C-terminal region as the protein expressed from the canonical 
ATG-initiated ORF. Variability of RAN proteins will occur when expressed from: sense or antisense transcripts; different repeat motifs and with variations in 
C-terminal sequences. 



Zu et al. (11) performed immuno staining with two types of anti- 
bodies: (i) a well-established monoclonal antibody that detects 
expanded polyGln tracts ( 1 9,20) and (ii) a novel antibody devel- 
oped to detect the unique C-terminal region of the predicted 
CAG encoded poly-Gin RAN protein (1 1). Positive immunos- 
taining was observed in DM1 myoblasts, skeletal muscle and 
blood. Similar staining was found in an established DM1 
mouse model (2 1 ,22), which showed staining of cardiomyocytes 
and leukocytes (11). Additionally, in both humans and mice 
polyGln aggregates co-localized with caspase-8, an early indica- 
tor of polyGln-induced apoptosis (23). Although RNA 
gain-of-function effects in DM1 are known to cause specific 



alternative splicing changes, these results suggest the possibility 
that RAN translation may also contribute to this disorder. 

The discovery of RAN translation combined with growing evi- 
dence that many microsatellite expansion mutations are trans- 
cribed in both directions (2) suggests that in addition to previously 
considered gene products, expansion mutations may also express 
up to six additional RAN proteins (Fig. 2) — each of which may 
contribute to disease (Fig. 3). Consistent with this prediction, 
RAN translation has recently been reported in two additional dis- 
orders: fragile X-associated tremor ataxia syndrome (FXTAS) 
(24) and C90RF72 amyotrophic lateral sclerosis/frontotemporal 
dementia (ALS/FTD) (25-27). 
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Figure 3. Potential pathways of pathogenesis of repeat-associated disorders. Bidirectional transcription of an expanded repeat will produced two transcripts 
(blue = antisense, red= sense), each potentially capable of structure formation and contributions to pathogenesis. In the RNA toxicity model (1st upper and 
lower panels/light gray ), the structures formed by the expanded repeats sequester cellular RNA-binding proteins, thereby interrupting their normal cellular function. 
The expanded repeats and sequestered proteins may form foci, which may contribute to toxicity or serve a protective function. The proteins sequestered will depend on 
the structures formed by the RNAs and protein affinity to the structures. For example, expanded CUG transcripts in DM 1 sequester the MBNL family of splicing factors 
which leads to a loss of MBNL function and alternative splicing abnormalities. In the protein gain-/loss-of-function model (second upper panel, medium gray), the 
ATG-initiated production of expanded proteins may: 1 ) disrupt or overwhelm cellular pathways (i.e. proteasomes or autophagy) designed to clear aberrant proteins; 
directly contribute to cellular apoptosis or damage; aggregate or fonn inclusions that serve a protective function or exacerbate toxicity; or 2) disrupt the normal function 
of the protein. For example, in huntington's disease, the mutant huntingtin protein disrupts multiple regulatory pathways, including transcription, ubiquitin protea- 
somal system, autophagy and synaptic transmission. The discovery of RAN translation has added a third potential pathway for disease (upper and lower panels, dark 
gray). Up to six additional repeat-containing proteins may be produced from the expanded sense and antisense transcripts. RAN proteins may contribute to pathogen- 
esis in a similar or even amplified manner as the protein gain-/loss-of-function pathway. RAN proteins are found within affected patient tissues, suggesting that they 
contribute to disease. 



RAN TRANSLATION AND THE CGG REPEATS OF 
FRAGILE X TREMOR ATAXIA SYNDROME (FXTAS) 

Fragile X-associated tremor ataxia syndrome (FXTAS) is a 
late-onset disorder that primarily affects the cerebellum and 
causes coordination deficits and cognitive decline (28-30). 
This is caused by a specific range of expanded CGG repeats 
(55-200 repeats) within the 5' UTR of the FMR1 gene (28), 
whereas longer repeats (>200 CGGs) are associated clinically 
distinct Fragile X syndrome (3 1 ). In contrast to the transcription- 
al silencing and loss of protein expression in Fragile X syndrome 
(32), FXTAS is associated with increased CGG transcripts that 
accumulate as RNA foci in human autopsy tissue (33). The asso- 
ciated increased mRNA expression, neurodegeneration and 
CGG-repeat containing neuronal inclusions (33-35) suggested 
an RNA gain-of-function mechanism. However, not all aspects 
of disease pathology, such as inclusion size and associated pro- 
teins (34,36), are readily explained by this mechanism. Recent 
work by Todd et al. (24) has shown that RAN translation may 
explain some of these incongruous aspects of FXTAS pathology. 

Initially, Todd et al. (24), noticed aggregates in a fly model 
designed to express a non-coding CGG EX p mutation upstream 



of a GFP reporter. This group performed a series of experiments 
to understand the molecular basis of these aggregates and to test 
the hypothesis that FXTAS CGG expansion mutations undergo 
RAN translation. First, they showed evidence from Drosophila, 
including mass spectrometry, that a high-molecular weight 
fusion protein is expressed that contains a homopolymeric 
glycine expansion. Second, in transfected mammalian cells 
they showed CGG expansions trigger RAN translation in at 
least two out of three reading frames producing polyGly-GFP 
and polyAla-GFP fusion proteins. Third, in the poly Ala 
frame, RAN translation is length dependent with polyAla 
detected using constructs with 8 8 but not 3 0 CGG repeats . In con- 
trast, polyGly expression occurred with 88, 50 and 30 CGGs 
(Table 1). While the poly-Gly protein was produced from con- 
structs containing only 30 repeats, aggregation was only asso- 
ciated with longer repeats tracts. Fourth, these authors 
performed a number of experiments that indicate translation ini- 
tiation can begin upstream of the CGG repeat in the polyGly 
reading frame. Fifth, these authors show evidence that the 
polyGly RAN protein accumulates as aggregates in several 
model systems and in human FXTAS brains using several 
custom C-terminal antibodies. In summary, Todd et al. (24) 
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provide strong evidence that FXTAS CGG expansions undergo 
RAN translation, and that at least one of the predicted homopo- 
lymeric RAN proteins accumulates in FXTAS brains. 



RAN TRANSLATION AND C90RF72 ALS 

A large G4C2 hexanucleotide repeat expansion in intron 1 of the 
C9orf72 gene was recently identified as the most common cause 
of ALS/FTD (37,38). Repeat tracts in unaffected controls typical- 
ly contain fewer than 23 G4C2 repeats, while expansions in ALS/ 
FTD patients range from hundreds to more than 1000 repeats 
(37-39). Initially, haploinsufficiency and RNA gain-of-function 
were suggested as possible disease mechanisms because the 
expansion mutation decreases C90RF72 transcript levels and 
G4C2 expansion transcripts form RNA foci (37). Two recent 
studies suggest RAN translation as a third possible mechanism 
(25,26). 

RAN translation of the C90RF72 G4C2 hexanucleotide 
expansion mutation is predicted to result in the expression of di- 
peptide proteins: GlyPro (GP), GlyArg (GR) and GlyAla (GA). 
Two groups developed antibodies to these predicted dipeptide 
motifs and used them to examine patient tissues to look for in 
vivo evidence of RAN translation (25,26). Mori et al. (26) used 
antibodies to all three predicted dipeptide products, while Ash 
et al. (25) focused on the GP frame. Both the groups performed 
a detailed examination of patient tissues and showed that these 
antibodies recognize inclusions in C90RF72 ALS/FTD autopsy 
tissue. In the Mori et al. study(26), the GA antibody, and to a 
much lesser extent the GP and GR antibodies, detected inclusions 
in the cerebellum, hippocampus and other brain regions. 
These inclusions were similar in shape and abundance to typical 
ALS/FTD inclusions (40) and colocalized with p62 but 
not phospho-TDP-43 (26). Inclusions that are p62-positive/ 
phospho-TDP43 negative are classic features of ALS/FTD path- 
ology (40-43). In the Ash et al. study (25), the GP antibodies 
detected widespread neuronal cytoplasmic and intranuclear inclu- 
sions throughout the central nervous system. These inclusions 
were also morphologically similar to the classic ALS inclusions 
(25). In both the studies, these antibodies did not detect aggregates 
in C90RF72 -negative disease controls (25,26). More recently, 
Almeida et al. (27) showed that neurons derived from C90RF72- 
positive iPS cells have GP-positive aggregates, elevated p62 
levels and an increased sensitivity to cellular stress induced by 
autophagy inhibitors. Taken together, data from these studies 
suggest that dipeptide repeat proteins, expressed by RAN transla- 
tion, contribute to ALS/FTD. 



COMMON THEMES IN RAN TRANSLATION 

RAN translation has now been reported in four diseases and has 
been shown to occur across four different types of repeat motifs: 
CAG, CUG, CGG and GGGGCC. Among this diversity, several 
common themes are emerging. First, RAN translation is repeat 
length-dependent with translation more likely with longer ex- 
pansion mutations. Second, RAN translation in different 
reading frames have different length thresholds, such that 
longer repeats are more likely to result in the accumulation of 
a cocktail of RAN proteins expressed from different reading 
frames. It is possible that the simultaneous expression of RAN 



proteins across long repeats may plays a role in anticipation, 
the earlier onset and increased disease severity associated with 
longer repeats. Third, all RAN-competent repeat motifs des- 
cribed to date form unusual secondary structures (44-48). 
Fourth, all disorders in which RAN translation has been reported 
to date have neurological features. 



NEXT STEPS IN RAN TRANSLATION 

What are the critical next steps in RAN translation research? 
From the analysis so far, it is clear that research needs to move 
beyond the observational and into the mechanistic. For 
example, what are the precise RNA structural, sequence and 
protein factor requirements for RAN translation? Answering 
these questions will yield important clues to the breadth and 
scope of RAN translation. Future analysis also needs to be 
extended beyond immunological approaches to more detailed 
structural and biochemical analyses of RAN translation proteins 
in disease. Antibody-based techniques are often subject to 
artifacts and technical problems, which may be particularly 
problematic for antibodies directed against repeat motifs them- 
selves. Additionally multiple approaches will be necessary to 
validate results, especially given the possibility of overlap 
between RAN translation and other cellular processes. For 
example, the products of RAN translation and frameshifting 
may appear to be identical when looking at regions only down- 
stream of the repeat motif. Given the discovery of RAN transla- 
tion, previous reports of frameshifting for disorders such as 
SCA3 and HD (49-51) warrant re-examination. A more 
general question is does RAN translation occur across all micro- 
satellite expansion diseases and if so when, where and why? 
Additional studies will be required to sort out which RAN pro- 
teins are toxic and their potential contribution to disease. 



CONCLUSIONS 

In summary, RAN translation is a novel mechanism that impacts 
our basic understanding of gene expression, cell biology and 
disease. Because more than 30 diseases are caused by microsat- 
ellite expansion mutations RAN translation may produce an 
abundant, yet previously unrecognized set of mutant proteins 
that contribute to a large category of neurological diseases. Add- 
itionally, recent evidence from ribosome profiling studies 
(24,52-58) suggests that translation is more widespread than 
previously appreciated. Furthermore, because >50% of the 
human genome consists of repetitive DNA and repetitive, 
hairpin-forming sequences undergo RAN translation, the dis- 
covery of RAN translation could reveal an abundant, yet previ- 
ously unrecognized category of repeat-containing proteins. 
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